NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Efficient High-Throughput DNA Breathing Features Generation Using Jax-EPBD

https://doi.org/10.1101/2024.12.06.627191

Inan, Toki Tahmid; Kabir, Anowarul; Rasmussen, Kim; Shehu, Amarda; Usheva, Anny; Bishop, Alan; Alexandrov, Boian; Bhattarai, Manish (December 2024, bioRxiv)

Abstract DNA breathing dynamics—transient base-pair opening and closing due to thermal fluctuations—are vital for processes like transcription, replication, and repair. Traditional models, such as the Extended Peyrard-Bishop-Dauxois (EPBD), provide insights into these dynamics but are computationally limited for long sequences. We presentJAX-EPBD, a high-throughput Langevin molecular dynamics framework leveragingJAXfor GPU-accelerated simulations, achieving up to 30x speedup and superior scalability compared to the original C-based EPBD implementation.JAX-EPBDefficiently captures time-dependent behaviors, including bubble lifetimes and base flipping kinetics, enabling genome-scale analyses. Applying it to transcription factor (TF) binding affinity prediction using SELEX datasets, we observed consistent improvements inR²values when incorporating breathing features with sequence data. Validating on the 77-bp AAV P5 promoter,JAX-EPBDrevealed sequence-specific differences in bubble dynamics correlating with transcriptional activity. These findings establishJAX-EPBDas a powerful and scalable tool for understanding DNA breathing dynamics and their role in gene regulation and transcription factor binding.
more » « less
Full Text Available
Scalable DNA Feature Generation and Transcription Factor Binding Prediction via Deep Surrogate Models

https://doi.org/10.1101/2024.12.06.626709

Kabir, Anowarul; Inan, Toki Tahmid; Rasmussen, Kim; Shehu, Amarda; Usheva, Anny; Bishop, Alan; Alexandrov, Boian; Bhattarai, Manish (December 2024, bioRxiv)

Abstract Simulating DNA breathing dynamics, for instance Extended Peyrard-Bishop-Dauxois (EPBD) model, across the entire human genome using traditional biophysical methods like pyDNA-EPBD is computationally prohibitive due to intensive techniques such as Markov Chain Monte Carlo (MCMC) and Langevin dynamics. To overcome this limitation, we propose a deep surrogate generative model utilizing a conditional Denoising Diffusion Probabilistic Model (DDPM) trained on DNA sequence-EPBD feature pairs. This surrogate model efficiently generates high-fidelity DNA breathing features conditioned on DNA sequences, reducing computational time from months to hours–a speedup of over 1000 times. By integrating these features into the EPBDxDNABERT-2 model, we enhance the accuracy of transcription factor (TF) binding site predictions. Experiments demonstrate that the surrogate-generated features perform comparably to those obtained from the original EPBD framework, validating the model’s efficacy and fidelity. This advancement enables real-time, genome-wide analyses, significantly accelerating genomic research and offering powerful tools for disease understanding and therapeutic development.
more » « less
Full Text Available
Analysis of AlphaFold2 for Modeling Structures of Wildtype and Variant Protein Sequences

https://doi.org/10.29007/5g4v

Kabir, Anowarul; Inan, Toki; Shehu, Amarda (March 2022, EPiC Series in Computing)

ResNet and, more recently, AlphaFold2 have demonstrated that deep neural networks can now predict a tertiary structure of a given protein amino-acid sequence with high accuracy. This seminal development will allow molecular biology researchers to advance various studies linking sequence, structure, and function. Many studies will undoubtedly focus on the impact of sequence mutations on stability, fold, and function. In this paper, we evaluate the ability of AlphaFold2 to predict accurate tertiary structures of wildtype and mutated sequences of protein molecules. We do so on a benchmark dataset in mutation modeling studies. Our empirical evaluation utilizes global and local structure analyses and yields several interesting observations. It shows, for instance, that AlphaFold2 performs similarly on wildtype and variant sequences. The placement of the main chain of a protein molecule is highly accurate. However, while AlphaFold2 reports similar confidence in its predictions over wildtype and variant sequences, its performance on placements of the side chains suffers in comparison to main-chain predictions. The analysis overall supports the premise that AlphaFold2-predicted structures can be utilized in further downstream tasks, but that further refinement of these structures may be necessary.
more » « less
Full Text Available
Protein Decoy Generation via Adaptive Stochastic Optimization for Protein Structure Determination

https://doi.org/10.1109/BIBM49941.2020.9313102

Zaman, Ahmed Bin; Inan, Toki Tahmid; Shehu, Amarda (December 2020, IEEE Intl Conf on Bioinformatics and Biomedicine (BIBM))
null (Ed.)
Full Text Available
Adaptive Stochastic Optimization to Improve Protein Conformation Sampling

https://doi.org/10.1109/TCBB.2021.3134103

Zaman, Ahmed Bin; Inan, Toki Tahmid; De Jong, Kenneth; Shehu, Amarda (January 2021, IEEE/ACM Transactions on Computational Biology and Bioinformatics)

We have long known that characterizing protein structures structure is key to understanding protein function. Computational approaches have largely addressed a narrow formulation of the problem, seeking to compute one native structure from an amino-acid sequence. Now AlphaFold2 promises to reveal a high-quality native structure for possibly many proteins. However, researchers over the years have argued for broadening our view to account for the multiplicity of native structures. We now know that many protein molecules switch between different structures to regulate interactions with molecular partners in the cell. Elucidating such structures de novo is exceptionally difficult, as it requires exploration of possibly a very large structure space in search of competing, near-optimal structures. Here we report on a novel stochastic optimization method capable of revealing very different structures for a given protein from knowledge of its amino-acid sequence. The method leverages evolutionary search techniques and adapts its exploration of the search space to balance between exploration and exploitation in the presence of a computational budget. In addition to demonstrating the utility of this method for identifying multiple native structures, we additionally provide a benchmark dataset for researchers to continue work on this problem.
more » « less
Full Text Available

Search for: All records